Rust metrics #17

asteurer · 2025-09-08T23:02:11Z

When you get a chance, will you review and let me know if this is good to merge?

Signed-off-by: Andrew Steurer <[email protected]>

asteurer · 2025-10-09T00:43:03Z

Closes #8 (probably)

calebschoepp

Ran this locally and it worked. Very exciting!! I think we have quite a bit of iteration to do before this is ready. My first concern is getting the WIT looking right then the Spin and opentelemetry-wasi implementations will fall out of that.

Still loading all this stuff back into my head after a long time away so this is just a preliminary bout of comments and questions to get us started.

It might speed up this whole process if you could in a paragraph or two provide some color on how you ended up with the WIT that you have right now so I can have some more context.

Good work. Excited to see all this.

wit/metrics.wit

rust/examples/spin-metrics/src/lib.rs

asteurer · 2025-10-15T20:45:06Z

My process for building this came from looking closely at the opentelemetry-rust implementation for a ResourceMetrics type and attempting to translate it into WIT. I didn't spend a lot of time looking at other language implementations of ResourceMetrics, so that might be a good next step for further refining the WIT.

Let me know if you need more context.

asteurer · 2025-10-15T22:47:21Z

In case there are concerns about me referencing an unstable API for metrics, looks like the folks in the otel-rust group in the CNCF slack declared metrics and logs stable: https://cloud-native.slack.com/archives/C03GDP0H023/p1759196279655829?thread_ts=1759196007.336079&cid=C03GDP0H023

calebschoepp · 2025-10-16T18:21:54Z

Spent some time reading the spec ¹, looking at what you have, and pondering. I think it would be useful to back up and look at this from a 10,000 foot view.

10,000 foot view

At the highest level we have a component that has a bunch of metrics that need to make it out of the component (typically into the host runtime, but maybe a parent component in the case of composition). There's two ways we can do this: push or pull. Push means the guest decides when to send it's metrics to the host. For example this is how wasi-otel traces works. Pull means that the host decides when to get metrics from a guest.

Pull

Pull means that the host decides when to get metrics from a host. Some pseudo-WIT might look like:

export collect: func() -> result<metrics, error>;

The guest would provide an implementation of this collect function and the host would get to call it whenever it wants² to collect the guest metrics.

On the opentelemetry-wasi side of things (ref) we would likely model it as something like WasiPullMetricExporter that implements the MetricReader trait/interface.

I imagine a lot could be said on why there are pros and cons to the pull architecture, but the best I have right now is that my gut is telling me we should explore push first. I imagine in the future we may need to support both push and pull for different use cases.

Push

Push means the guest decides when to send it's metrics to the host. Some pseudo-WIT might look like:

import export: func(my-metrics: metrics) -> result<_, error>;

The most naive way we could model this in opentelemetry-wasi is by saying that each instrument will immediately export the data when it is called therefore bypassing the aggregation that occurs within the opentelemetry SDK for metrics normally. This is potentially simpler, but I don't think we should do this because a common reason metrics is used is for performance critical parts of code where you don't want to be jumping between the host and guest all the time.

A less naive way to model this would be to have something like WasiPushMetricExporter that implements the MetricExporter trait/interface. A MetricExporter doesn't have a way to get at the aggregated metrics that the OTel SDK is holding though so we need to combine it with a MetricReader that can. In normal OTel land you would do this with some kind of PeriodicReader that embeds a MetricExporter. On a regular interval it would read the metrics and push them out with the exporter.

In our pre-wasi-p3 world though it is hard to build a PeriodicReader because we don't really have threading and async is a bit of nightmare. We could use the ManualReader but it doesn't embed a MetricExporter and automatically do the thing out of the box. This leaves us two options:

Create a CustomManualReaderThatIsToUnblockUsButTheoreticallyUsefulElsewhere that is just a ManualReader but embeds a MetricExporter in it and exports when you run collect.
Tell the consumer of opentelemetry-wasi to wire up a ManualReader and the WasiPushMetricExporter themselves.

I don't know which is the right choice. It's worth noting that in either pattern we would basically telling the user to make sure they run some version of exporter.export(reader.collect()) at the end of their component code. This means we're hosed for component dependencies because how does the component dependency know when to export³.

The world is complicated

Here's a couple of unordered things that are complicate the design space:

WASI P3 is close, but not fully here yet. Given this I don't really want to design for a P2 world, but it is hard to design for the P3 world that is not fully here yet.
Theoretically we want wasi-otel to support all wasm use cases e.g. long running, component dependencies, etc. In practice we should probably just design for a simpler FaaS single instance kind of use case and then expand from there.
We have approximately no input from end users on how they want to do metrics (push/pull) to inform us. Another thing just pushing us in a direction of action and iteration rather than trying to find the perfect solution.

Looking at your current implementation

Let's interpret your current implementation through the lens I've laid out so far.

It seems like you've modelled it off of opentelemetry-prometheus (which is pull) but made it push. Best I can tell WasiMetricExporter doesn't actually do anything. WasiMetricCollector is basically acting as a MetricExporter but doesn't actually implement the push trait.

My understanding of the necessity of WasiMetricExporter may be wrong though so I would love to be corrected.

What do I think we should do

We don't have all the info to make a perfect decision, but here is what I think we should do.

Go for the push pattern.
- Play around to see if option 1 or 2 is more ergonomic with regards to readers.
Once working get it merged.
That's probably enough progress to unblock landing this stuff in Spin and making phase progress with the wasi proposal.
In the background keep exploring to see what pull based implementation would look like.

All of API, data model, and SDK, but SDK was most relevant ↩
In practice wasi-otel would likely outline semantic conventions of when a host should poll for metrics. ↩
This will eventually be fixed by WASI P3, or works in the pull model, or you could just always push which is also a kind of sucky solution. ↩

asteurer · 2025-10-16T23:31:13Z

Edit: No longer relevant

Signed-off-by: Andrew Steurer <[email protected]>

rust/src/metrics/reader.rs

calebschoepp

Still not a full review, but going to stop here b/c I don't want to get into the nits until we address this core stuff

calebschoepp · 2025-10-17T21:42:52Z

wit/metrics.wit

+    /// `collect` gathers all metric data related to a Reader from the SDK
+    collect: func(metrics: resource-metrics) -> result<_, otel-error>;


I would expect this to be called export now.

calebschoepp · 2025-10-17T21:44:03Z

wit/metrics.wit

+    variant otel-error {
+        already-shutdown,
+        timeout(duration),
+        internal-failure(string),
+    }


I'm skeptical that we're actually observing these set of errors with our setup. Are you sure this is the exact set of errors that are possible and that we want to bake into the wit? Or should we just have the error be a string for now?

calebschoepp · 2025-10-17T21:44:55Z

wit/metrics.wit

+    /// The WASI representation of the `OTelSdkError`.
+    ///
+    /// See https://github.com/open-telemetry/opentelemetry-rust/blob/353bbb0d80fc35a26a00b4f4fed0dcaed23e5523/opentelemetry-sdk/src/error.rs#L15
+    variant otel-error {


Nit: Redundant to say something is a WASI X in a WIT file.

Nit: Don't love us linking to the Rust SDK as opposed to some canonical OTel reference spec page.

calebschoepp · 2025-10-17T21:45:20Z

wit/metrics.wit

+    /// Aggregated metrics data from an instrument.
+    variant aggregated-metrics {
+        /// All metric data with `f64` value type.
+        %f64(metric-data),
+        /// All metric data with `u64` value type.
+        %u64(metric-data),
+        /// All metric data with `s64` value type.
+        %s64(metric-data),
+    }


I don't really understand why this variant is necessary.

rust/examples/spin-metrics/src/lib.rs

rust/src/metrics/reader.rs

calebschoepp · 2025-10-17T22:01:30Z

rust/src/metrics/reader.rs

+    }
+}
+
+impl MetricReader for WasiMetricReader {


Maybe stick the comment I suggested about why we have a manual reader here to explain all these simple impls 🤷 ?

calebschoepp · 2025-10-17T22:01:56Z

rust/src/metrics/reader.rs

+    fn temporality(&self, kind: InstrumentKind) -> Temporality {
+        match kind {
+            InstrumentKind::ObservableCounter
+            | InstrumentKind::ObservableGauge
+            | InstrumentKind::ObservableUpDownCounter => {
+                panic!("Async InstrumentKinds are not yet supported");
+            }
+            _ => self.reader.temporality(kind),
+        }


Why aren't they supported?

My understanding of observable instruments is that they periodically export in the background. Using a ManualReader to export observable instruments would mean that metrics would be sent all at once, effectively making them non-observable counters. I think it would be confusing for people to be able to use observable counters and not have them work how they expect.

calebschoepp · 2025-10-17T22:05:47Z

rust/examples/spin-metrics/Cargo.toml

@@ -0,0 +1,17 @@
+[package]


Do you think the tracing and metrics examples living in the same example is nicer?

I'm not sure! Part of me likes the clear separation of examples; however, I can see how keeping everything together might make for a more interesting example. I'm open to either.

feat(metrics): adding metrics to the SDK

de3c34b

Signed-off-by: Andrew Steurer <[email protected]>

calebschoepp requested changes Oct 15, 2025

View reviewed changes

wit/metrics.wit Show resolved Hide resolved

wit/metrics.wit Outdated Show resolved Hide resolved

rust/examples/spin-metrics/src/lib.rs Outdated Show resolved Hide resolved

rust/examples/spin-metrics/src/lib.rs Outdated Show resolved Hide resolved

asteurer force-pushed the rust-metrics branch 2 times, most recently from 97a1dd7 to 6435fa9 Compare October 16, 2025 23:28

asteurer requested a review from calebschoepp October 16, 2025 23:28

fix(metrics): misc refactors

bf8425f

Signed-off-by: Andrew Steurer <[email protected]>

asteurer force-pushed the rust-metrics branch from 6435fa9 to bf8425f Compare October 17, 2025 01:56

asteurer commented Oct 17, 2025

View reviewed changes

rust/src/metrics/reader.rs Show resolved Hide resolved

calebschoepp requested changes Oct 17, 2025

View reviewed changes

		/// `collect` gathers all metric data related to a Reader from the SDK
		collect: func(metrics: resource-metrics) -> result<_, otel-error>;

Rust metrics #17

Are you sure you want to change the base?

Rust metrics #17

Uh oh!

Conversation

asteurer commented Sep 8, 2025

Uh oh!

asteurer commented Oct 9, 2025

Uh oh!

calebschoepp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asteurer commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asteurer commented Oct 15, 2025

Uh oh!

calebschoepp commented Oct 16, 2025

10,000 foot view

Pull

Push

The world is complicated

Looking at your current implementation

What do I think we should do

Footnotes

Uh oh!

asteurer commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

calebschoepp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asteurer commented Oct 15, 2025 •

edited

Loading

asteurer commented Oct 16, 2025 •

edited

Loading